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ABSTRACT / " /: 

/ Host probleas with tests have to do with' their use, 
sisuse. Or 3/ack of use. Test scores can be of value to teachers who 
know how and how not to use thes.. The purpose of this booklet is to 
provide -a brief overview of standardized testing and to explain soae 
of the coaaonly used tersinology. Topics discussed include: 
teacher-* ade and standardized tests, uses of standardized test 
results, types of standardized tests, test validity, scores and 
noras, derived scores (percentile ranks, grade eguivaient, stanines) , 
working with student profiles, and aptitude test scores. The 
bibliography is confined to a few readable sources which eaphasize 
the understanding and use of tests in sore detail. (RC) 
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AN INTRODUCTION TO STANDARDIZED TESTING 
FOR TEACHERS AND ADMINISTRATORS 

E. GARY JOSELYN 
UNIVERSITY OF MINNESOTA 



Testing, like so many things these days, has its extremists. On the one 
hand, there are those who believe all tests are worthless, unfair, and 
even damaging to schools and students. At the other end of the con- 
tinuum ar? those who place blind, unquestioning faith in test scores, 
attributing almost magical qualities to them. The truth, I believe, lies 
somewhere between. Most problems with tests have to do with their 
use, misuse, or lack of use. Test scores can be of value to teachers who 
know how and how not to use them.. ■> * 

The purpose of this booklet is to provide a brief overview of standard- 
ized testing and to acquaint you with some of the commonly used 
terms. The bibliography is confined to a. few readable sources which 
emphasize the understanding and use of tests in more detail. 



Teacher-made and Standardized Tests 

Teacher-made and standardized tests complement each other. Both are 
necessary for adequate evaluation of individual pupils and groups of 
pupils. Teacher-made tests are given quite often to monitor pupil and 
class learnings in rather specific areas that are the subject of recent 
classroom instruction. Their content is specific to the content of a 
i particular classroom and reflects the specific objectives of the teacher/ 
Standardized tests, usually given only once a year or even less often, 
offer comparisons with external groups, in broader achievement areas. 
They provide standardized measures and are administered under care- 
fully prescribed conditions. 



Uses of Standardized Test Results 

Test results may be used for administrative, guidance, or instructional 
purposes. 1 

Schools use standardized test scores for administrative purposes 
such as getting ajn overall picture of the level and range of abilities and 
achievement of the student body, placing students in special groups, 
and evaluating curriculum. 

Guidance uses have as their principal objective greater self-under- 
standing on the part of. individual students. Test scores, often used in 
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one-to-one interviews jvjth guidance counselors, help students identify 
their own strengths and weaknesses and make educational vocational 
plans. 

Instructional uses are by classroom teachers for the purpose of im- 
proving and individualizing instruction. 

Only instructional uses are addressed in this booklet, but teachers 
should be aware that standardized test scores have many uses for many 
audiences in addition to classroom applications. 



Types of Standardized Tests 

Almost all tests may be categorized as one of four kinds: aptitude, 
achievement, interest, or personality. 

Aptitude tests are designed to measure a person's potential— that is; to 
predict performance at some future time, to measure what a person can 
learn. 

Achievement tests indicate a person's present proficiency or what he 
has learned. 

Interest tests are designed to help students understand their own 
interests and how these may relate to various occupations or courses of 
study. 

Personality tests include a broad rang$ of instruments that attempt to 
describe how persons adjust to their environment. Since classroom 
teachers seldofo see or use interest inventories or personality tests, 
they are not discussed here. The primary empha&is in this discussion is 
on the use and interpretation of achievement test results with some 
brief attention given to aptitude tests. 



Test Validity 

The validity of a (est is the degree to which that test measures what it 
purports to measure. Achievement tests attempt to describe what a per- 
son has learned. The validity of an achievement test, Jherefore, is deter- 
mined by carefully examining the content of the test and making a 
judgment as to how adequately it samples the subject area. 

Aptitude tests attempt to predict a person's performance at some 
futuretime. Thus, the validity of aptitude tests is determined by studies 
that investigate how closely performance on the test is related to later 
performance in the situation the test purports to predict. 

Teachers can usually assume that those who selected a particular 
test for a school's testing program studied the test carefully and are 
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„ satisfied that it has good validity for use in 4 the school. When using 
achievement tests, however, it is important that teachers examine the 
content of the test by looking at the item outline and the Items them- 
selves and make their own judgments as to how closely the test reflects 
the instructional goals and objectives of their subject areas. 



Scores and Norms 

A student's performance on a test is described by a test score. A raw 
score is simply the number of .test items a student answered correctly, 
or this number adjusted to correct for guessing. Raw scores have little 
meaning in themselves because tests vary in the number of items they 
have and in the difficulty of their items. To give them meaning, raw 
scores are converted to another typeof score.'Any test score other than 
a raw score is called a derived score; and there are many different kinds. 
(Some are discussed in the following section.) 

Derived scores give meaning to a student's test performance by 
comparing it with the performance of some known group. The known 
group to which the test has1>een given and which, supplies us with a 
reference for evaluating the score of the individual is known as the norm 
group. Knowledge of the norm group is obviously very important for the 
proper interpretation of test scores. Although precise knowledge about 
the make-up of a norm group is of vital concern to persons charged wi.th 
the responsibility of selecting a particular test for a school's testing 
program, normally it is not of.much concern to the classroom teacher. 
The tests used in a school's testing program are usually chosen by 
. persons who study the tests and the norm group thoroughly, and class- 
room teachers can usually trust thaf the norms are adequate and 
appropriate. 

When interpreting scores, however, it iaxritical to keep in mind the 
norm group to which the scores refer: 

/ 

National norms compare students' performances with those of a. large 
group of students selected to be representative of students at the same 
grade level throughout the nation. 

Local norms compare students' performances with those of their class- 
mates of the same grade level in the same school system. 

Sometimes scores are based on other norm groups. In addition to 
commonly used national and local norms, you may run across norms 
based upon students of a particular region or state, students of dif- 
ferent levels of ability, or students in particular kinds of schools (large- 
small, urban-suburban, public-private). 

Remember that ell scores (except raw scores) are tied to some norm 
group and therefore describe relative, not absolute, performance. 
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Some Derived Scorts 



Three of the most commonly used derived scores are grade equivalents, 
percentile ranks, and stanines, which are described below. 

Grade-equivalent (QE) scores: Grade-equivalent scores, sometimes 
called grade-level scores, represent the most common methpd of re- 
porting performance on acKievement tests. QE scores show the average 
score for students at a particular grade level. If, for example, the 
average raw score for all students in the norm group taking the test at 
the beginning of the sixth grade is 51, then 6.0 becomes the GE score 
for a score of 51 . The first digit in the 6E score is the grade level and the 
second is the month. A grade equivalent of 4.3, for example, represents 
the average performance of students in the third month of the fourth 
grade. 

One reason for the popularity of the GE scores is that they seem easy 
to interpret. However, they are often misinterpreted, and teachers 
should remember the following points: 

1. Different achievement batteries are written by different authors and 
are published by different publishers who sample different groups of 
students to make up their norm groups. One should not expect, 

> therefore, that a student taking two different tests of the same kind 
(reading, for example) will necessarily receive the same GE score on 
both. 

2. One should not Interpret a GE score to mean that a student should be 
given learning materials designed for that particular level. It cannot 
be assumed that a fifth grade student who receives a GE score of 7.0 
should be promoted to4he seventh grade.' 

3. A common and eaiy-to-make misinterpretation of GE scores is to 
assume that identical GE scores on two different subtests represent 
equivalent performance on each as compared with other students in 
the same grade. For example, a fifth grade student who achieves a 
GE of 7.0 on both the reading and arithmetic tests of an achievement 
battery would seem to have performed equally well in both subject 
areas. Actually, while he got as many items right as the average stu- 
dent at the beginning of the seventh grade on both tests, his per- 
formance on the arithmetic test is considerably better than on the 
reading test is compered with fifth grade students. This is because 
the spread of scores Is different for almost every subtest on an 
achievement battery. While it is not necessary for a teacher to know 
the exact-amount of the difference in the spread of scores for each 
subtest, he should know thatthese differences exist and he should 
avoid the conclusion that equal GE scores indicate equivalent per- 
formance in two different subject areas. 
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4i\Finally, teachers must resist the temptation to use GE scores as 
standards of performance. We often hear people say "Forty percent 
of our students are reading below grade level** or "bring everyone up 
- to grade level:'* Such statements imply that it is bad If anyone scores 
below grade level. They reveal Ignorance of the fact that GE scores 
represent the average performance of students at a particular grade 
level and, by definition, half of any group must be "below average.** 
With the accountability movement gaining momentum, it is important 
that teachers help both parents and their fellow teachers understand 
that GE scores represent average performance and, therefore, cannot 
be used as evaluative standards of performance. \ , ~ 

p0rcenUl9 Ranks {PR): Of all the different derived scores, the PR is 
probably least subject to misinterpretation. A student's percentile rank 
represents the percentage of students in the norm group who received 
the same or a lower score. A of 65, for example, Indicates that the 
student performed as well or better than 65 percent of the norm group. 
And, of course, 35 percent scored higher. It is important to keep in mind 
that percentile rink scores represent percentage of students in the norm 
group, not percentage of items answered correctly. 

One problem with percentile rank scores Is that they do not reflect the 
fact that academic achievement scores tend to bunch near the average 
score in the middle and spread out toward the high and low extremes. 
There may be a tendency, therefore, to place too much importance on 
percentile rank differences near the middle of the range and to place-too 
little importance on differences near the extremes. Percentile ranks of 
50 and 55 probably represent insignificant differences in performance in 
terms of the number of items answered correctly, while percentile ranks 
of 90 and 95 do represent significantly different levels of performance. 



Stanines: Many achievement test reports Include another type of score 
called a stanine. The name comes from standard scores of nine units. 
Stanine scores have several advantages. Each stanine value represents 
approximately equal ranges of scaled scores, which avoids the problem 
overemphasizing small, Insignificant differences in the middle of the 
range that could appear as large differences when expressed as per- 
centile ranks. The statistical characteristics of stanine scores are such 
that one may, with a fair amount of oonfidence, Interpret adifference of 
two stanine units between.the scores on two tests as representing true 
differences in performance. ~ - 

\ - 

Working with Student Profiles 

Achievement battery scores for individual students, or groups of stu- 
dents are usually shown graphically on profiles which provide a visual 
display of a person's or a group's overall level of achievement and 
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«^ particular strengths and weaknesses. Profiles of percentile rank scores 
may be plotted on scales on which the distance between percentile rank 
points is collapsed In the middle of the scale and expanded near the 
extremes. This helps to avoid the problem of misinterpretation of scor? \ 
differences in the middle and hear the extremes that was discussed^" 
previously. 

When Interpreting a student's achievement battery profile, look first 
at the overall level of the scores. Although almost every student scores 
better In some areas than In others, there Is a tendency for the scores of 
individuals to fall fairly close together. How does the overall level of 
measured achievement fit with your expectations, based on your 
knowledge of your students* performance In classes and on other 
measure*? \ 

Teachers "usually find that their predictions of students' test per- 
formances are fairly accurate. But occasionally a teacher finds that " 
- scores on the prof lie are quite different from what was expected. It is at 
these times that standardized test results may serve their most useful 
purpose. Testing may be worth the effort and expense If even one quiet, 
low-qphfevlng student shows up much higher than expected on the test, 
is thereby brought to the attention of the teacher, and is motivated to 
achieve his full potential. 

Next, look at the peaks and valleys in the profile. Notice the areas In 
which the student seems to be particularly strong or weak and consider 
their implications for planning and Instruction so the strengths may be 
capitalized upon and the weaknesses strengthened. 



Aptitude Tast Scores 

Because there are substantial and significant Individual differences In 
learning ability, school instruction is almost always preceded by some 
effort to judge the capacity of students to learn. Just as some people 
are taller than-others, some-tan run faster, and some have a better ear 
for music, so, too, some persons learn school subjects more easily 
than others. Individual differences In learning ability do exist, and 
teachers must take these differences Into account If they are to fulfill 
their obligation to meet, the unique needs of each student. 
Tests of learning ability are often called "intelligence' 1 or "scholastic 
\ aptitude" tests. It has been well documented that scores on these tests 
are related to school performance. Although psychologists continue to 
struggle to define intelligence and to debate the nurture vs. nature 
Issue, teachers who use scholastic ability 'test scores will be better 
served if they think Af them rather narrowly and simply as indicators of 
future academic'ffeWormance. 

v /O Scores: Scholastic ability test performance Is most usually reported 
as percentile rank stores, IQ scores, or both. The concept of the Intel- 
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Ilgence quotient comes from the time when Intelligence was defined as 
the quotient obtained by dividing a person's mental age by his 
chronological age. Today, 1Q scores are no longer calculated in this 
manner, but the name persists despite rather general agreement among 
test experts that our schools and students would be best served if IQ 
scores wete done away with. Scores on present IQ tests are simply 
standard scores with an average and spread that approximate those of 
the scores found on earlier IQ tests. 

While the norm group for percentNe rank and most other derived 
scores it usually made up of other students at the same grade level, the 
norm group for IQ scores consists of other students of the same age. 
This ih itself diminishes the value of IQ scores for schools because the 
most important variable affecting what happens to a student in school 
' is his grade placement, not his age. There are many other difficulties 
with IQ scores, most of which are too Complex to deal with in this • 
space. Perhaps it is sufficient to say that teachers should ignore IQs 
and direct their attention to percentile ranks or stanine scores whenever 
•possible.* 

The greatest value of learning ability tests is that' they may call atten- 
tion to the few students who have unexpected discrepant scores. There 
are two kinds of discrepancies in which teachers should be most 
interested— students whose measured learning ability Is quite different 
from their school achievement (the so-called underachieves and over- 
achievers) and students whose abilities are very different from those of 
their classmates. * 

Learning ability test scores are rough indicators and should serve 
mainly as clues which stimulate further, more Intensive diagnosis. 
Remember that low measured scholastic ability which has been sub- 
stantiated by other indicators does not mean a student cannot learn. 
Every student can learn. Low ability means that there may be limitations 
to the rate of learning and the complexity of material that can be 
learned. Low test scores are not telling us that these students are 
doomed to fail. They are telling us, however, that they will surely fail 
unl0$$ they^re treeted differently from average students: By the same- 
token, students with extremely high scholastic ability may very likely 
become disenchanted with schooling and either withdraw or become 
discipline problems unless they are treated differently from avenge 
students. 



What I Have Not Talked About 

In an effort to keep this booklet short, I have not included a number of 
other possible topics such as scoring (because today most standardized 
tests are machine-scored), test edmlnlstretlon (because most Instruc- 
tions for administration furnished with testing, materials cover these 
procedures precisely for each test), and stetlstlcel concepts 9 end deflnh 
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flon$ (because adequate explanation would 'take too much space and 
because such knowledge is not essential to good uie of test scores by 
teachers). There are "many excellent books and articles on these and 
other aspects of tests and test Interpretation; some of which are in- 
cluded in the list of suggested readings on the following page. Finally, 
teachers are urgqd to talk with the person in the school who is respon- 
sible for testing to learn more about test interpretation and about their 
own schoors tests/ 
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Suggested Readings 

The following two books provide readable and^xtensive coverage of the 
useand interpretation of standardized tests and of the construction of 
classroom tests: 4 * 

Gronlund, N.E. Measurement and evaluation in thi classroom] 2nd Edi- 
tion, New York: The Macmillan Co., 197^. 

- Mehrens, W.A., & Lehmann, I.J. Measurement and evaluation in educa- 
tion and psychology. New York: Holt, Rinehart, and Winston, 1973. 

The following publications are part of the Measurement In Education 
series of the National Council on Measurement in Education. These 
short (8-10 pages) monographs are concerned with the practical impli- 
cations of educational measurement, emphasizing uses of measure- 
ment rather than technical or theoretical issues. They are available at 35 
cents each from! 

- Office of Evaluation Service 

Michigan State University 
East Lansing. Michigan 48823 < 



Aircesian, P.W., & Madaus, G.F. Criterion-referenced testing in the 
classroom. Vol. 3, No. 1. 

Coffman, W.E. On the reliability ratings of essay examinations. Vol. 3, 
No. 2. 

Cureton, L.W. The history of grading practices. Vol. 2, No. 4. 

Ebel, R.L. Shall we get rid of grades? Vol. 5, No. 4. 

Gardner, E.F. Interpreting achievement profiles— uses and warnings. 
Vol. 1, No. 2 

<p « 

Joselyn, E.G., & Merwin, J.C. Using your achlavamant fast scora re? 
ports. Vol. 3, No. 1. •*• , 

Mayo, S.T. Mas fry turning and mastary testing. Vol. 1, No. 3. 

Tyler, R. Assassing"aducttlontl achievement in the affactivo domain. 

Vol. 4, No. 3. 

Warrington, W.G. An item analysis sarvlca tor taachars. Vol. 3, No. 2. 
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